Chapter 2 Point Pattern Anlysis
In this script we will explore point pattern analysis (ppa) in R. The most common package for these analyses in R is spatstat (Baddeley and Turner 2005). Other introductions to ppa and spatstat in specific are Chapter 4 in Fletcher and Fortin (2018), Chapter 11 in Pebesma and Bivand (2022), Baddeley and Turner (2005), and Baddeley, Rubak, and Turner (2015). There are some less common package that we will not delve into here but which I do want to mention as they might be helpful to you at some future point: ads (Pélissier and Goreaud 2015), ecespa (Cruz Rot 2008), and splancs (Rowlingson and Diggle 2022), stpp (Gabriel et al. 2022).
2.1 the ppp class
Spatstat introduced its own object class to R, the ppp class.
Most functions which I will present in the following script only work with this object class.
To illustrate the package we will use the a data set of bird occurrences in Germany we already used in the [spatial subsetting] chapter, the Database of Global Administrative Areas (GADM) which we also already used in the first lecutre and a DEM of Rhineland Palatinate download.
library(pacman)
p_load(sf, spatstat, mapview, dplyr, ggplot2, magrittr, terra, raster, maptools, data.table)birds <- readRDS("data/meisen2.rds")
gadm <- st_read("data/gadm36_DEU_3_pk.gpkg")## Reading layer `gadm36_DEU_3_pk' from data source `C:\Users\jonat\Documents\Uni\teaching\GIS\gisbook2\data\gadm36_DEU_3_pk.gpkg' using driver `GPKG'
## Simple feature collection with 4680 features and 16 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 5.866251 ymin: 47.27012 xmax: 15.04181 ymax: 55.05653
## Geodetic CRS: WGS 84
DEM <- rast("data/DTM Germany_Rheinland-Pfalz 20m.tif")We will need to prepare the bird data before we can start with the analyses. First we remove all entries with missing coordinates.
birds2 <- birds[!is.na(birds$decimalLatitude), ]Then we remove duplicate observations, so that there is only one observation per location.
This easiest with the unique() function from the data.table package.
While base R also has a unique() function, the base R version only works with vectors.
It returns all unique entries of a vector.
The data.table version works with data frames and returns only non duplicated rows.
birds2 <- unique(birds2, by = c("decimalLatitude", "decimalLongitude"))Now we can turn the data frame into a sf object.
birds2 <- st_as_sf(birds2, coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)Lastly, we subset the data to only observations within Rhineland Palatinate.
birds2 <- st_filter(birds2, filter(gadm, NAME_1 == "Rheinland-Pfalz"))Let’s have a look.
mapview(birds2)These bird observations are our starting point for the ppa.
We can use the as.ppp() function to convert the sf object into a ppp.
There is one caveat however: the data need to be in a projected coordinate system.
Otherwise as.ppp() will return an error.
Currently, our data are in a geographic coordinate system (latitude and longitude) so we need to transform them first.
I choose ETRS89 / UTM zone 32N (N-E) (EPSG 3044) because that is the CRS of the DEM.
birds2 <- st_transform(birds2, crs = 3044)We will not cover marked point patterns in this tutorial. Therefore, we need to remove all variables except for the geometry.
birds2 <- dplyr::select(birds2, geometry)Now we can create the ppp.
birds_ppp <- as.ppp(birds2)
birds_ppp## Planar point pattern: 9547 points
## window: rectangle = [300361, 463668.4] x [5426570, 5641700] units
We see the we have a planar (i.e. 2 dimensional) point pattern with 9547 point which is equal to the number of rows in birds2.
Besides the points, the ppp object also contains the window, the rectangle in that includes all points.
We can plot the ppp using the base plot function.
plot(birds_ppp)
The window can be queried separately through the Window() function.
The resulting object has the class owin (observation window).
x <- Window(birds_ppp)
class(x)## [1] "owin"
2.2 First order processes
The most basic first order property is the density, the number of points in an area.
We can use the quadratcount() function to separate our window in qudrants and count the point in each.
The number of quadrant rows and columns is determined through nx and ny.
First we compute the global density.
To this end we just need a single quadrant so both nx and ny are one.
Q0 <- quadratcount(birds_ppp, nx= 1, ny=1)
plot(birds_ppp, pch=20, cols="grey70", main=NULL)
plot(Q0, add=TRUE) 
For the plot I have adjusted the symbol (pch), point color (cols) and plot title (main).
By varying the number of quadrants we can alter the number of local densities we compute.
Q1 <- quadratcount(birds_ppp, nx= 2, ny=2)
Q2 <- quadratcount(birds_ppp, nx= 3, ny=6)
Q3 <- quadratcount(birds_ppp, nx= 10, ny=20)
plot(birds_ppp, pch=20, cols="grey70", main = NULL)
plot(Q1, add=TRUE) 
plot(birds_ppp, pch=20, cols="grey70", main = NULL)
plot(Q2, add=TRUE) 
plot(birds_ppp, pch=20, cols="grey70", main = NULL)
plot(Q3, add=TRUE) 
The intensity is the density per area and can be computed by intensity().
Q0.d <- intensity(Q0, image = T)
Q1.d <- intensity(Q1, image = T)
Q2.d <- intensity(Q2, image = T)
Q3.d <- intensity(Q3, image = T)# density raster
plot(Q0.d, main = NULL) 
plot(Q1.d, main = NULL) 
plot(Q2.d, main = NULL) 
plot(Q3.d, main = NULL) 
The unit - points per square meter - is not very intuitive.
We can change it to points per square kilometer with the rescale() function.
birds_km <- spatstat.geom::rescale(X = birds_ppp, s = 1000, unitname = "km")No we get the same results but in the new and much more readable units.
Q <- quadratcount(birds_km, nx= 10, ny=20)
Q.d <- intensity(Q, image = T)
plot(Q.d, main = NULL) 
We can also use predefined areas to compute density and intensity.
We will go through two different approaches here.
The first is based on a categorized raster and the second through a polygon layer.
The categorized raster is the DEM of Rhineland Palatinate we loaded in the beginning.
As the raster is quite large we will aggregate cells to increase cell sizes and decrease the resolution.
We loaded the raster with the function from the terra package (rast()).
terra is the leading package for working with raster data in R at the moment (see here for an introduction) .
It supersedes the raster package which was the standard package before.
Each of the two packages has its own object class to store the raster internally.
The raster package uses the class RasterLayer and the terra package uses the class SpatRaster.
Some packages have not adjusted to the switch to terra yet and still require RasterLayer objects.
This is also the case for spatstat, which in turn again uses its own raster object class im.
Thus, we need to transform the SpatRaster to a RasterLayer which can be done with the function raster() and transform the RasterLayerto an im with as.im().
DEM2 <- terra::aggregate(DEM, fact = 4)
DEM3 <- raster(DEM2)
DEM4 <- as.im(DEM3)DEM4 is still a continuous raster where each cell stores an elevation value.
Now we will create a categorical raster which stores elevation classes instead of elevation values.
First, we need to define the breaks, i.e., which elevation values correspond to which classes.
As we have no a priori classification in mind here we will use the quartiles of the elevation values.
summary(values(DEM2))## DTM Germany_Rheinland-Pfalz 20m
## Min. : 27.7
## 1st Qu.:223.1
## Median :318.2
## Mean :318.1
## 3rd Qu.:415.0
## Max. :815.9
## NA's :2787887
breaks <- c( -Inf, 223, 318, 415 , Inf)With the cut() function we assign the cells of DEM4 to four classes labeld 1 to 4 according to the breaks defined in breaks.
Afterwards we need to transform this to another object class one more time with tess().
elev_class <- cut(DEM4, breaks = breaks, labels = 1:4)
elev_class <- tess(image = elev_class)
plot(elev_class)
Now we can use the classified elevation raster as quadrants for the quadratcount() function.
Q_elev <- quadratcount(birds_ppp, tess = elev_class)
plot(Q_elev)
Q_elev2 <- intensity(Q_elev, image = T)
plot(Q_elev2)
Based on this we can already see, that the number of observations tends to decrease along the elevation classes.
As a second example we will use sf polygons to define quadrants.
Here we will do so with the districts within Rhineland Palatinate.
In the end, we still need to provide quadratcount()with an object of class tess.
This can be created with
# subset all of Germany to just RLP
rlp <- filter(gadm, NAME_1 == "Rheinland-Pfalz")
# assign new CRS that conforms the to CRS of birds
rlp <- st_transform(rlp, 3044)
# Aggregate smaller district units to NAME_2 level
rlp %<>%
group_by(NAME_2) %>%
summarise(geom = st_union(geom))
# sf polygons can be transformed to owin objects with as.owin().
# We create a list where each object is one entry is one district as owin.
rlp_list <- list()
for(i in 1:nrow(rlp)){
rlp_list[[i]] <- as.owin(rlp[i, ])
}
# convert owin to tess
rlp_owin <- as.tess(rlp_list)
Q <- quadratcount(birds_ppp, tess = rlp_owin)
cl <- interp.colours(c("lightyellow", "orange" ,"red"), length(rlp_list))
plot(Q, col = cl)
Q.d <- intensity(Q, image = T)
plot(Q.d, col = cl, main = NULL)
2.2.1 Kernel density
We can create continuous density surfaces with kernel density estimation.
K1 <- density(birds_ppp)
plot(K1, main = NULL)
contour(K1, add=TRUE)
We can adjust the bandwidth with the sigma argument.
In the following, we create nine different kernels with increasing bandwidths to illustrate the difference.
par(mfrow = c(3,3))
for (i in seq(10000, 50000, by = 10000)) {
x = density(birds_ppp, sigma = i)
plot(x, main = paste0("bandwidth =", i))
contour(x, add = T)
}
We can also compare different kernel functions in the same way.
For the density function from the Raster package, there are four different kernel functions available:
1) Gaussian (the default option), 2) Quadrtic 3) Disc and 4) Epanechnikov.
par(mfrow = c(2,2))
for (i in c("gaussian", "quartic", "disc", "epanechnikov")) {
x = density(birds_ppp, sigma = 20000, kernel = i)
plot(x, main = paste0("Kernel =", i))
contour(x, add = T)
}
We can estimate the relationship between the intensity of a spatial point process and an environmental variable with the rhohat() function.
This function fits a non-parametric model that does not assume a specific functional form like linear or quadratic.
rho <- rhohat(birds_ppp, DEM4, method ="ratio")
plot(rho)
We see four different lines: \(\hat{\rho}\) is the estimated mean intensity as function of the elevation, \(\bar{\rho}\) the overall mean instensity, \(\rho_{hi}\) and \(\rho_{lo}\) are the upper and lower bound of the 95% confidence interval respectively. We can use this model to predict the density for each cell of DEM4.
pred <- predict(rho)
plot(raster(pred))
Again, we see that the intensity is highest in lowland areas and decreases with increasing elevation.
Alternatively, we can use a Poisson point process model (PPM) for prediction. The poission point process follow as functional form. It assumes that the number of points within a given region follow the Poisson distribution. The model is written out as: \[\mathbf{P}(N(B)=n)=\frac{\lambda^{n}(\nu(B))^{n}}{n !} \exp (-\lambda \nu(B))\] Where \(\mathbf{P}(N(B)=n)\) is the probability that the number of points in area \(B\) (\(N(B)\)) equals \(n\). \(\lamda\) is the intensity, the rate parameter of the Poission distribution, and therefore the parameter that is estimated when we fit the model. \(\nu(B)\) is the area of \(B\). When running this command, there may be a warning message. This is not a problem.
PPM1 <- ppm(birds_ppp ~ DEM4)## Warning: Values of the covariate 'DEM4' were NA or undefined at 33% (16530 out of 49551) of the quadrature points. Occurred while
## executing: ppm.ppp(Q = birds_ppp, trend = ~DEM4, data = NULL, interaction = NULL)
The effectfun() function now calculates the trend in bird intensity with increasing elevation.
PPM1_effect <- effectfun(PPM1, "DEM4", se.fit=TRUE)
plot(PPM1_effect, las=1)
Again we have a mean estimate as function of the elevation (\(\hat{\lambda}\)), as well as higher and lower bounds of the 95% confidence interval (\(\lambda_{hi}\) and \(\lambda_{lo}\)). The general form is similar to the non-parametric model, but the decline in intensity is slower. Both methods thus come to similar conclusions:
par(mfrow = c(1,2))
plot(rho, main = "non-parametric model")
plot(PPM1_effect, main = "PPM" )
pred_ppm <- predict(PPM1)
plot(raster(pred), col = cl, main = "non-parametric model")
plot(pred_ppm, col = cl, main = "PPM")
2.3 Second order processes
Now we can turn to the second order processes, i.e., to metrics and functions that consider the location of point with respect to other points.
2.3.1 Average nearest neighbor
For the average nearest neighbor (ANN) method we use the nndist() function.
It calculates the distance from each point from the nearest one.
The argument k indicates that we are interested in the nearest neighbor.
With k = 2 we could calculate the distance to the second nearest neighbor.
par(mfrow = c(1,1))
NN <- nndist(birds_ppp, k = 1)
ANN <- mean(nndist(birds_ppp, k = 1))
plot(sort(NN))
abline(h = ANN, col = "red", lwd = 2)
We can see that most points are close to other ones.
Few point are isolated with distances of up to approximately 10 kilometers to the next observation.
In total we have 9547 observations so there are 9546 (n-1) potential class distances (i.e. values for k) we could coumpute.
As this seems excessive we will content ourselves with 999 distance classes, to see how distance increases with increasing degrees of neighborhood.
We can call nndist() with a vector as k.
The function computes the distance for each observation and each value of k and returns them in a matrix.
Each row is one observation and each column is one value of k.
As we are interested in the mean distance per k value, we summarize this with the apply() function.
apply() takes the matrix we compute with nndist() and applies the function in FUN to each column.
It applies the function to columns because we set the MARGIN to 2.
If we would have set the MARGIN to 1 apply() would have computed row wise means.
n <- 999
NN <- nndist(birds_ppp, k = 1:n)
ANN <- apply(X = NN, MARGIN = 2, FUN = mean)
plot(ANN ~ eval(1:n), type = "l", main=NULL, las=1)
2.3.2 The K-Funktion
Next we will use the K,L and G functions.
All three are implemented as functions in `spatstat.
K <- Kest(birds_ppp, correction = "isotropic")
plot(K)
K is always larger than that of a poisson distributed pattern. This indicates clustered observations.
This is confirmed by the L function. The term . -r ~ r is necessary to set the line straight to zero.
L <- Lest(birds_ppp, correction = "isotropic")
plot(L, . -r ~ r)
Lastly, the G-function
G <- Gest(birds_ppp)
plot(G)
For the G-Function different estimators are displayed. The first three lines are different estimators of the G-Function and are generally in agreement in this case. The last line (\(G_{pois}\)) is the G-Function of a poisson point process. As with the K and the L function before, that fact that the estimates for our data are higher than that for the poisson processes point to the fact, that our data are clustered.
2.3.3 Morans I
Lastly, we will have a look at Moran’s I a function for that looks for spatial autocorrelation.
This will be different from the other function we covered in this tutorial so far, because it requires a different package spdep and we will need a mark.
Thus we need to go back to the bird data and keep the individualCount variable.
library(spdep)
birds3 <-
birds |>
filter(!is.na(decimalLatitude) & !is.na(individualCount)) |>
st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = 4326) |>
st_filter(filter(gadm, NAME_1 == "Rheinland-Pfalz")) |>
dplyr::select(count = individualCount)
birds3 %<>%
mutate(x.coord = st_coordinates(birds3)[,1],
y.coord = st_coordinates(birds3)[,2])
setDT(birds3)
birds3 %<>% unique(by = c("x.coord", "y.coord")) %>%
st_as_sf() %>%
st_drop_geometry()After preparing the bird data we first need to create a neighborhood list for our data.
This works in three steps:
1. identify the nearest (or kth) neighbor with knearneigh().
2. Turn knn object created by knearneigh() into a neighbors list of class nb with knn2nb().
3. Add spatial weights nb with nb2listw().
For our case spatial weights simply reinforce the neighborhood scheme. So only the nearest neighbor is weighted all other observations get a weight of zero.
Then finally, we can use moran.test() to compute Moran’s I.
We provide the function with the variable of interest (the mark) and the weighted list of neighbor.
knn <- knearneigh(birds3, k = 1)
nb <- knn2nb(knn)
listw <- nb2listw(nb)
moran.test(x = birds3$count,
listw = listw)##
## Moran I test under randomisation
##
## data: birds3$count
## weights: listw
##
## Moran I statistic standard deviate = 23.907, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.978453524 -0.001047120 0.001678658
From these results we can see that close points are more similar than would be expected by chance (p-value <0.05). Remember that a Moran’s I of 0 would indicate random dispersal, 1 perfect clustering, and -1 perfect dispersal.